A Hybrid Architecture for Plagiarism Detection

نویسنده

  • Demetrios G. Glinos
چکیده

We present a hybrid plagiarism detection architecture that operates on the two principal forms of text plagiarism. For order-preserving plagiarism, such as paraphrasing and modified cut-and-paste, it contains a text alignment component that is robust against word choice and phrasing changes that do not alter the basic ordering. And for non-order based plagiarism, such as random phrase reordering and summarization, it contains a two-stage cluster detection component. The first stage identifies a maximal passage in the suspect document that is related to the source document, while the second stage determines whether the suspect passage corresponds to the entire source document or just to a passage within it. Three implementations of this architecture, involving a common text alignment component and three different cluster detection components, participated in the PAN 2014 Text Alignment task and performed very well, achieving very high precision, recall, and overall plagiarism detection scores.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

English-Persian Plagiarism Detection based on a Semantic Approach

Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...

متن کامل

Hybrid System For Plagiarism Detection

The Internet boom in recent years has increased the interest in the field of plagiarism detection. A lot of documents are published on the Net everyday and anyone can access and plagiarize them. Of course, checking all cases of plagiarism manually is an unfeasible task. Therefore, it is necessary to create new systems that are able to automatically detect cases of plagiarism produced. In this p...

متن کامل

Hybrid Segmentation Prototype for Arabic Text-Based Documents: Towards Plagiarism Detection

The contribution of this work relates to the field of Arabic text-based document analysis for the detection of plagiarism. This analysis will be carried out according to the triadic computation model of document similarity. The authors propose a hybrid segmentation prototype for Arabic text-based documents that links different processing steps in order to generate the similarity rate between th...

متن کامل

Plagiarism Detection in Arabic Documents: Approaches, Architecture and

Plagiarism detection is a sensitive field of research which has gained lot of interest in the past few years. Although plagiarism detection systems are developed to check text in a variety of languages, they perform better when they are dedicated to check a specific language as they take into account the specificity of the language which leads to better quality results. Query optimization and d...

متن کامل

Plagiarism Detection through Internet using Hybrid Artificial Neural Network and Support Vectors Machine

Currently, most of the plagiarism detections are using similarity measurement techniques. Basically, a pair of similar sentences describes the same idea. However, not all like that, there are also sentences that are similar but have opposite meanings. This is one problem that is not easily solved by use of the technique similarity. Determination of dubious value similarity threshold on similari...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014